1. Source of data ℹ️

The data for the current study was sourced from the Atlas of Living Australia website which is an open data source supported by The Australian Government and is hosted by Commonwealth Scientific and Industrial Research Organisation (CSIRO). The data for occurrences of wedge-tailed eagles (Aquila (Uroaetus) audax) across Australia can be accessed through this link to the dataset.

2. Usage permission ⚠️

The Atlas of Living Australia is an open data repository for all the information on the bio-diversity of Australia, supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS) and is hosted by The Australian Government and is hosted by Commonwealth Scientific and Industrial Research Organisation (CSIRO).

The current usage of the publicly available dataset is available for use through a CC-BY attribution license. This particular license allows us to reuse, replicate, tweak and reproduce the data as long as we cite the original source of the data. The license can be looked into further detail through the link to licensing details.

3. Data retrieval 🔧

a) Data download procedure

The process for obtaining the dataset and storing them in an RDatabase file is as follows:

  1. The dataset is publicly available in the Atlas of Living Austrlia (ALA) website.

  2. In the search bar, we need to type in “Wedge tailed eagle” which will return us the available data with the scientific name of the bird. The keyword for obtaining the data is to use the scientific name for wedge tailed eagle which is “Aquila (Uroaetus) audax”. The step is shown through the image below for reference. The red box shows the keyword entered in the search bar while the red arrow shows the link to the dataset used for the current study.

  1. The data retrieval is done through the use of the “Galah” library on R. The Galah library allows us to download species occurrence records, taxonomic information, sounds and images by restricting the queries to certain locations using the Latitude and Longitude values. A detailed description of the package along with the installation process may be accessed here.

Once the Galah library has been installed and setup in the R Studio environment, the following code is run to obtain the required dataframe. To limit the size of the dataframe being queried, the following filters are applied to answer the questions outlined by the task.

  1. Data dating back to 1st January, 2000 have been queried.
  2. Taxonomic name of “Aquila (Uroaetus) audax” was used to obtain results exclusively for the occurrences of wedge tailed eagles across regions in Australia.
  3. Latitude, Longitude, eventDate, dataResourceName and occurenceStatus variables were queried.
library(galah)
galah_config(email = "abar0090@student.monash.edu",
             download_reason_id = 10, 
             verbose = TRUE)
eagles <- ala_occurrences(
  taxa = select_taxa("Aquila (Uroaetus) audax"))
eagles <- eagles %>% 
  rename(Longitude = decimalLongitude,
         Latitude = decimalLatitude) %>%
  mutate(eventDate = as.Date(eventDate)) %>%
  filter(!is.na(eventDate)) %>%
  filter(!is.na(Longitude)) %>%
  filter(!is.na(Latitude)) %>%
  filter(eventDate>"2000-1-1") %>%        #Filtering data from 1st Jan 2000 to latest data
  dplyr:: select(c(Latitude,Longitude,recordID,eventDate,dataResourceName,occurrenceStatus))  #Selecting relevant variables 
   

head(eagles)

b) Data saving procedure

After the data has been queried successfully and obtained as a dataframe, it is further saved into the Rdatabase file ‘eagles.rda’ using the save function and is shown in the R code chunk below with the appropriate comments.

save(eagles, file=here::here("data/wte.rda")) #Saving dataframe into eagles.rda 

c) Data processing

Since the volume of the dataset is considerably large, it is important to process the data in a manner that lets us create a subset of the data which allows for us to obtain the highest information gain without making the analysis very messy or complex.

It was further decided to obtain the wedge-tailed eagle sightings for all the locations in and around a 100 kilometer radius. For this purpose, a resource for calculating the distance between two sets of latitude and longitudes were used. The distance calculator can be accessed here. An image of the interface of the distance calculator is shown in the image below.

Using the above calculator, a set of 6 latitudes and longitudes around a distance of approximately 100 kms from the Adelaide and Longreach airports were obtained. The image below shows the 6 geolocation points around the Adelaide airport for reference.

From the above image, it can be observed that the locations within a 100 km radius for each of the airports can be obtained by creating a range of +/- 1 degree of the latitudes and longitudes of each of the airport locations. Hence, we can apply the filter function of dplyr library to create the subset of the necessary locations from which, the presence of the eagles can be obtained.

# Adelaide airport

eagles_ade <- eagles %>% filter(Latitude >= -35.9285) %>% filter(Latitude <= -33.9285) #Filtering latitudes

eagles_ade <- eagles_ade %>% filter(Longitude >= 137.5) %>% filter(Longitude <= 139.5) #Filtering longitudes 


head(eagles_ade)
# Longreach airport

eagles_lon <- eagles %>% filter(Latitude <= -22.4403) %>% filter(Latitude >= -24.4403) #Filtering latitudes

eagles_lon <- eagles_lon%>% filter(Longitude >= 143.2506) %>% filter(Longitude <= 145.2506) #Filtering longitudes 


head(eagles_lon)

4. Data description 📖

a) Variables

The data is in the file eagles.rda in the data directory. It contains these variables:

  • Latitude : Latitude value of the record obtained.
  • Longitude : Longitude value of the record obtained.
  • recordID : A unique identifier for each observation obtained.
  • eventDate : Date on which the observation was recorded.
  • dataResourceName : This variable helps us identify the source of the data.
  • occurenceStatus : An indicator to mark whether the wedge-tailed eagle was spotted for the corresponding latitude and longitude on the particular event date.

b) Summary of the dataset

(b.1) Time period of data

The current dataset contains all the occurrences of wedge-tailed eagles across Australia with latitude and longitude values from 1st January 2000 to the latest updated data records.

(b.2) Population of the dataset

The population of the dataset obtained from ALA website provides us with all the records of presence (or absence) of the wedge-tailed eagle for the corresponding latitude and longitude. This dataset of spatial variables would help us understand the population density of the bird in and around the airports of Adelaide and Longreach. Since we are required to assess the chances of bird strikes around these airports, hence, we create the population of the dataset in a manner such that we obtain the presence of the wedge-tailed eagle around 100 kms of these airports by creating a filtered dataset. As we have retrieved the data for the filtered locations for a considerable timeline, hence, the population of the sample dataset is expected to be a representative dataset for the entire population that was obtained from the ALA website.

(b.3) Selection of variables for the dataset

The reason for the selection of the variables for the current dataset have been delineated as follows :

  1. Latitude : This is one of the spatial variables that will help us pin point a geographical location where a record for the presence (or absence) of the wedge-tailed eagle was observed.

  2. Longitude : This is one of the spatial variables that will help us pin point a geographical location where a record for the presence (or absence) of the wedge-tailed eagle was observed.

  3. recordID : Helps us identify the number of unique observations of birds reported for a particular location and event date.

  4. eventDate : This is the date on which, the record was obtained. A temporal analysis can be performed using this variable as it would help us understand whether the presence of the bird has risen or dropped over the years in the locations close to the airports of Adelaide and Longreach.

  5. dataResourceName : This variable helps us understand the source of the data. This could be an important factor while basing our analysis as Atlas of Living Austrlia conducts regular data quality checks. If at any point, the license of a data resource provide is revoked due to quality concerns, we can identify which observations to filter out from the data at a later point. ALA’s data quality project can be referred through the link here.

  6. occurenceStatus : Whether the record obtained for the particular geographical location on a given date observed the presence or absence of the wedge-tailed eagle.

5. Limitations of the data ⭕

  1. Since the data obtained from ALA falls under the category of observational data, there could be instances of missing values. These have been filtered out while retrieving the data from the repository.

  2. Since the data is obtained from various sources, there could be inconsistencies in reporting the data.

  3. While the data collection was done in an extensive and granular manner, there could be a lack of precision of the observations made for the exact latitude and longitude.

  4. The dataset obtained from the Atlas of Living Australia only reports for all the wedge-tailed eagles that have been recorded. Hence, it does not report the entire population of these eagles across the selected regions of Australia.

  5. The dataset here is an observational data and in particular, occurences data. Some of the limitations that are prevalent in such datasets are as follows :

    • The data here is obtained from various sources (in particular, the various carrier airlines). As a result, there may be non-uniformity in the data provided as each airline may have their own interpretations of the data they may have provided.
    • The current dataset obtained is a subset of an observational data. These types of data are often plagued with lack of randomisation during the selection of data. This may lead to biases in the dataset such as selection and systematic bias.
    • The data maybe mis-classified or filled in non-uniform units by the various sources, leading to lack of accuracy of the overall dataset.

__________________________________ End of file ____________________________________